CLR & Garbage Collector (GC) Practical Example
⚡ GOAL
Parse and process a stream of price tick data efficiently, without allocating, without creating new strings, and with GC-friendly design.
---
🧩 1. The scenario
You receive a TCP stream of bytes like this:
EURUSD,1.07432,1.07436
GBPUSD,1.24587,1.24592
You need to:
- Parse each line into fields (symbol, bid, ask)
- Convert to typed data (
struct Tick) - Reuse buffers instead of new allocations
- Avoid string allocations except for the final symbol if needed
---
💻 2. The C# code
using System;
using System.Buffers;
using System.Buffers.Text;
using System.Text;
struct Tick
{
public string Symbol { get; init; }
public double Bid { get; init; }
public double Ask { get; init; }
}
class TickParser
{
private readonly ArrayPool<byte> _pool = ArrayPool<byte>.Shared;
public Tick Parse(ReadOnlySpan<byte> line)
{
// EURUSD,1.07432,1.07436
int firstComma = line.IndexOf((byte)',');
int secondComma = line.Slice(firstComma + 1).IndexOf((byte)',') + firstComma + 1;
// symbol bytes -> string (only one allocation)
string symbol = Encoding.ASCII.GetString(line[..firstComma]);
// Parse Bid
Utf8Parser.TryParse(line[(firstComma + 1)..secondComma], out double bid, out _);
// Parse Ask
Utf8Parser.TryParse(line[(secondComma + 1)..], out double ask, out _);
return new Tick { Symbol = symbol, Bid = bid, Ask = ask };
}
public void ProcessBatch(byte[] data)
{
var span = new ReadOnlySpan<byte>(data);
while (true)
{
int newline = span.IndexOf((byte)'\n');
if (newline == -1) break;
var line = span[..newline];
var tick = Parse(line);
// Do something: e.g., publish to queue
Console.WriteLine($"{tick.Symbol}: {tick.Bid} / {tick.Ask}");
span = span[(newline + 1)..];
}
}
public void Run()
{
byte[] buffer = _pool.Rent(1024);
try
{
string sample = "EURUSD,1.07432,1.07436\nGBPUSD,1.24587,1.24592\n";
int bytes = Encoding.ASCII.GetBytes(sample, buffer);
ProcessBatch(buffer.AsSpan(0, bytes).ToArray());
}
finally
{
_pool.Return(buffer);
}
}
}
class Program
{
static void Main()
{
var parser = new TickParser();
parser.Run();
}
}
---
🧠 3. What to highlight in the interview
Memory-efficient design:
- Uses
ReadOnlySpan<byte>to slice input lines → no string splits or temporary arrays. Utf8Parserparses directly from bytes into doubles → no string allocations.ArrayPool<byte>reuses buffers → avoids per-message allocation pressure on Gen0.- Only one small string allocation per line (
Symbol), which could also be interned or replaced by a dictionary of symbols in a real system.
GC impact:
- Minimal Gen0 churn; no Gen1/Gen2 or LOH allocations.
- Perfect candidate for low-latency message processing (market data, trade events).
- If you wrap this in an async TCP stream, you’d use
Memory<byte>instead ofSpan<byte>to crossawaitboundaries safely.
Potential improvements to mention:
- Pool parsed
Tickobjects if needed (e.g.,ObjectPool<Tick>or reuse structs). - Replace symbol string allocation with a symbol lookup table.
- Integrate with System.IO.Pipelines for streaming input.
---
🔍 4. Practice questions you can expect
→ Because it slices existing memory without allocating.
- “Why is
Span<T>faster than usingSplitorstring.Substring()?”
→ It reuses buffers, drastically reducing GC pressure in high-throughput systems.
- “Why use
ArrayPool<byte>instead ofnew byte[]?”
→ No, because Span<T> is stack-only. Use Memory<T> instead.
- “Can you use
Span<T>inside an async method?”
→ These short-lived spans die in Gen0, never get promoted — ideal for throughput.
- “What generation would this data typically live in?”
→ It goes on the Large Object Heap (LOH), which isn’t compacted by default and can fragment memory.
- “What happens if we allocate a 100 KB buffer?”
---
🧩 5. Optional 5-min extension: Counter check
If you have .NET SDK, run:
dotnet-counters monitor System.Runtime
Then run your parser. Watch:
Gen0 GC Countbarely increasesGen2 GC Countstays at 0Allocated Bytes/secminimal
That’s your evidence you’ve optimized allocations correctly.
---
Would you like me to follow up with a bonus version using System.IO.Pipelines — the way you’d implement this in a real-style backend (async, streaming, zero-copy)? That’s often a senior-level differentiator question.
---
Questions & Answers
Q: How does this parser minimize allocations?
A: It uses ReadOnlySpan<byte> to slice the input buffer, Utf8Parser to parse primitives directly from bytes, and ArrayPool<byte> to reuse buffers, leaving only one small string allocation per tick.
Q: Why is Utf8Parser preferred here?
A: It avoids converting byte segments into strings before parsing numbers, eliminating temporary allocations and respecting culture-invariant formats ideal for market data.
Q: How do you process partial messages with this approach?
A: Maintain leftover spans between reads or use System.IO.Pipelines, which handles split frames by giving you ReadOnlySequence<byte> to process once a newline delimiter appears.
Q: How would you adapt this for asynchronous sockets?
A: Swap buffer handling for PipeReader/PipeWriter, consume ReadOnlySequence<byte> segments, and use Memory<byte> to cross await boundaries safely while keeping parsing logic identical.
Q: What GC metrics confirm success?
A: Gen0 GC Count stays low relative to throughput, Gen2 GC Count remains near zero, and Allocated Bytes/sec is minimal. Use dotnet-counters to observe these while the parser runs.
Q: How do you reuse symbol strings to avoid per-message allocations?
A: Maintain a dictionary of interned symbols or use ReadOnlyMemory<byte> pointing to shared symbol tables, so repeated symbols reuse references instead of allocating new strings.
Q: What happens if you allocate 100 KB buffers per message?
A: They land on the LOH, leading to fragmentation and long Gen2 pauses. Renting from ArrayPool<byte> avoids repeated LOH allocations.
Q: How do you handle parsing failures?
A: Check the boolean return from Utf8Parser.TryParse and decide whether to drop/log the tick or route it to a poison queue. Avoid throwing in the hot path to keep allocation-free behavior.
Q: Can you pool Tick instances too?
A: Yes—use ObjectPool<Tick> or a struct pool if you need to reuse containers. Ensure pooled objects are reset before reuse to avoid leaking data.
Q: How would you extend this to publish to RabbitMQ/Kafka?
A: Serialize ticks using IBufferWriter<byte> or spans to keep serialization allocation-free, then push to the broker client that supports span-friendly APIs.